download.png

Context

In this study, we examine air accidents and predict fatality or seriousness of accidents.

What factors affect the occurrence of air accidents?

To what extent are machine learning algorithms used in predicting air accidents?

Import Libraries For Overview & EDA

Get The Dataset

Overview Of The Dataset

Data Cleaning

Columns

91= namely general aviation pilots & 121 = qualified pilots

EventID

Performance-Based Errors

Judgment & Decision-Making Errors

Violations

Teamwork

Mental Awareness

State of Mind

Physical Problems

Sensory Misperception

Physical Environment

Technological Environment

Inadequate Supervision

Planned Inappropriate Operations

Supervisory Violations

Resource Problems

Personnel Selection & Staffing

Climate/ Culture Influences

Policy & Process Issues

Technology Failure

Acts

Preconditions

Supervision

Organization

Fatal or Serious

Flight Segment 1=Taxi

91=1/121=0

Drop Non-Important Columns

EDA: Exploratory Data Analysis

Auto EDA with Sweetviz

Train Valid Test Split

Train Dataset:

Set of data used for learning (by the model), that is, to fit the parameters to the machine learning model

Valid Dataset:

Set of data used to provide an unbiased evaluation of a model fitted on the training dataset while tuning model hyperparameters. Also play a role in other forms of model preparation, such as feature selection, threshold cut-off selection.

Test Dataset:

Set of data used to provide an unbiased evaluation of a final model fitted on the training dataset.

Imbalance Dataset

Balance Train Dataset

Predictions

Logistic Regression

K Nearest Neighbors

Decision Tree

Hyper Parameter For Decision Tree

Random Forest

Support Vector Machine

Bernoulli Naive Bayes

Bagging Classifier Algorithm

Gradient Boosting Classifier Algorithm

XGboost Classifier

Voting Classifier

Test Models

0. Logistic Regression

1. Random Forest Classifier

2. XGboost Classifier

3. Gradient Boosting Classifier

4. Bagging Classifier

5. Support Vector Machine

6. Decision Tree Classifier

7. Bernoulli Naive Bayes

8. Voting Classifier

Cross Validation: Evaluating Estimator Performance

Comparing Machine Learning Algorithms

Comparing Machine Learning Algorithms On Imbalance Dataset

Association Rule Mining: Apriori

END =)